Search Results

Documents authored by Cho, Hwan-Gue


Document
Simple Order-Isomorphic Matching Index with Expected Compact Space

Authors: Sung-Hwan Kim and Hwan-Gue Cho

Published in: LIPIcs, Volume 248, 33rd International Symposium on Algorithms and Computation (ISAAC 2022)


Abstract
In this paper, we present a novel indexing method for the order-isomorphic pattern matching problem (also known as order-preserving pattern matching, or consecutive permutation matching), in which two equal-length strings are defined to match when X[i] < X[j] iff Y[i] < Y[j] for 0 ≤ i,j < |X|. We observe an interesting relation between the order-isomorphic matching and the insertion process of a binary search tree, based on which we propose a data structure which not only has a concise structure comprised of only two wavelet trees but also provides a surprisingly simple searching algorithm. In the average case analysis, the proposed method requires 𝒪(R(T)) bits, and it is capable of answering a count query in 𝒪(R(P)) time, and reporting an occurrence in 𝒪(lg |T|) time, where T and P are the text and the pattern string, respectively; for a string X, R(X) is the total time taken for the construction of the binary search tree by successively inserting the keys X[|X|-1],⋯,X[0] at the root, and its expected value is 𝒪(|X|lgσ) where σ is the alphabet size. Furthermore, the proposed method can be viewed as a generalization of some other methods including several heuristics and restricted versions described in previous studies in the literature.

Cite as

Sung-Hwan Kim and Hwan-Gue Cho. Simple Order-Isomorphic Matching Index with Expected Compact Space. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 61:1-61:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.ISAAC.2022.61,
  author =	{Kim, Sung-Hwan and Cho, Hwan-Gue},
  title =	{{Simple Order-Isomorphic Matching Index with Expected Compact Space}},
  booktitle =	{33rd International Symposium on Algorithms and Computation (ISAAC 2022)},
  pages =	{61:1--61:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-258-7},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{248},
  editor =	{Bae, Sang Won and Park, Heejin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.61},
  URN =		{urn:nbn:de:0030-drops-173466},
  doi =		{10.4230/LIPIcs.ISAAC.2022.61},
  annote =	{Keywords: Compact Data Structure, String Matching, Order-Preserving Matching, Suffix Array, FM-index, Binary Search Tree}
}
Document
A Compact Index for Cartesian Tree Matching

Authors: Sung-Hwan Kim and Hwan-Gue Cho

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)


Abstract
Cartesian tree matching is a recently introduced string matching problem in which two strings match if their corresponding Cartesian trees are the same. It is considered appropriate to find patterns regarding their shapes especially in numerical time series data. While many related problems have been addressed, developing a compact index has received relatively less attention. In this paper, we present a 3n+o(n)-bit index that can count the number of occurrences of a Cartesian tree pattern in 𝒪(m) time where n and m are the text and pattern length. To the best of our knowledge, this work is the first 𝒪(n)-bit compact data structure for indexing for this problem.

Cite as

Sung-Hwan Kim and Hwan-Gue Cho. A Compact Index for Cartesian Tree Matching. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.CPM.2021.18,
  author =	{Kim, Sung-Hwan and Cho, Hwan-Gue},
  title =	{{A Compact Index for Cartesian Tree Matching}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.18},
  URN =		{urn:nbn:de:0030-drops-139699},
  doi =		{10.4230/LIPIcs.CPM.2021.18},
  annote =	{Keywords: String Matching, Suffix Array, FM-index, Compact Index, Cartesian Tree Matching}
}
Document
Indexing Isodirectional Pointer Sequences

Authors: Sung-Hwan Kim and Hwan-Gue Cho

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
Many sequential and temporal data have dependency relationships among their elements, which can be represented as a sequence of pointers. In this paper, we introduce a new string matching problem with a particular type of strings, which we call isodirectional pointer sequence, in which each entry has a pointer to another entry. The proposed problem is not only a formalization of real-world dependency matching problems, but also a generalization of variants of the string matching problem such as parameterized pattern matching and Cartesian tree matching. We present a 2nlgσ+2n+o(n)-bit index that preprocesses the text T[1:n] so as to count the number of occurrences of pattern P[1:m] in 𝒪(mlgσ) where σ is the number of distinct lengths of pointers in T. Our index is also easily implementable in practice because it consists of wavelet trees and range maximum query index, which are widely used building blocks in many other compact data structures. By compressing the wavelet trees, the index can also be stored into 2nH^*₀(T)+2n+o(n) bits where H^*₀(T) is the 0-th order empirical entropy of the distribution of pointer lengths of T.

Cite as

Sung-Hwan Kim and Hwan-Gue Cho. Indexing Isodirectional Pointer Sequences. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 35:1-35:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.ISAAC.2020.35,
  author =	{Kim, Sung-Hwan and Cho, Hwan-Gue},
  title =	{{Indexing Isodirectional Pointer Sequences}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{35:1--35:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.35},
  URN =		{urn:nbn:de:0030-drops-133797},
  doi =		{10.4230/LIPIcs.ISAAC.2020.35},
  annote =	{Keywords: String Matching, Suffix Array, FM-index, Wavelet Tree, Range Minimum Query, Parameterized String Matching, Cartesian Tree Matching}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail